Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information
نویسندگان
چکیده
Hierarchical clustering is a popular method for grouping together similar elements based on a distance measure between them. In many cases, annotations for some elements are known beforehand, which can aid the clustering process. We present a novel approach for decomposing a hierarchical clustering into the clusters that optimally match a set of known annotations, as measured by the variation of information metric. Our approach is general and does not require the user to enter the number of clusters desired. We apply it to two biological domains: finding protein complexes within protein interaction networks and identifying species within metagenomic DNA samples. For these two applications, we test the quality of our clusters by using them to predict complex and species membership, respectively. We find that our approach generally outperforms the commonly used heuristic methods.
منابع مشابه
Finding Biologically Accurate Clusterings in Hierarchical Decompositions Using the Variation of Information
Hierarchical clustering is a popular method for grouping together similar items based on a distance measure between them. These clusters can be used to infer annotations for uncharacterized items. However, in many cases, annotation information for some elements is known beforehand. We present a novel approach for decomposing a hierarchical clustering into the optimal clusters that match a set o...
متن کاملDIAGNOSIS OF BREAST LESIONS USING THE LOCAL CHAN-VESE MODEL, HIERARCHICAL FUZZY PARTITIONING AND FUZZY DECISION TREE INDUCTION
Breast cancer is one of the leading causes of death among women. Mammography remains today the best technology to detect breast cancer, early and efficiently, to distinguish between benign and malignant diseases. Several techniques in image processing and analysis have been developed to address this problem. In this paper, we propose a new solution to the problem of computer aided detection and...
متن کاملشناسایی گونههای درختی در تودههای پهنبرگ آمیخته جنگلهای خزری با استفاده از تصاویر پهپاد (مطالعه موردی: جنگل دارابکلا)
Unmanned aerial vehicles (UAVs) images have high spatial resolution. They are a valuable source of information for mapping land cover and thematic information, particularly in the identification of tree species. The aim of this study was to investigate the capability of drone images and the base object method for detecting tree species in the Hyrcanian forests. For this purpose, part of an area...
متن کاملQuadtree and Octree Grid Generation
Engineering analysis often involves the accurate numerical solution of boundary value problems in discrete form. Hierarchical quadtree (or octree) grid generation offers an efficient method for the spatial discretisation of arbitrary-shaped two- (or three-) dimensional domains. It consists of recursive algebraic splitting of sub-domains into quadrants (or cubes), leading to an ordered hierarchi...
متن کاملDetermining Difference in Evolutionary Variation of Bacterial RecA proteins vs 16SrRNA Genes by using 16s_Toxonomy Tree
Background and Aims: The rate of variation in various genes of a bacterial species is different during evolution. Therefore, in systematic bacterial studies many researchers compare the phylogenetic tree of a particular gene to the standard tree of an rRNA gene. Regarding the importance of 16SrRNA gene and the evolutional process of RecA protein family, we investigated the changes in the select...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 17 3 شماره
صفحات -
تاریخ انتشار 2009